Goto

Collaborating Authors

 trigger regret




No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Neural Information Processing Systems

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium.


No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium Andrea Celli

Neural Information Processing Systems

Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium.




Review for NeurIPS paper: No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Neural Information Processing Systems

Summary and Contributions: The authors provide a regret-minimisation approach to computing an analogue to correlated equilibria in extensive form games called extensive-form correlated equilibria (EFCE). It was previously unknown whether EFCE can be achieved via uncoupled no-regret dynamics as in typical correlated equilibria in simultaneous games, and the authors provide a way of doing so by introducing an appropriate notion of regret in the extensive form setting (that lines up with the notion of approximation in approximate EFCE), and demonstrating how achieving low regret in this setting suffices to have an approximate EFCE for joint strategy profiles that arise from empirical frequencies of play. As mentioned before, the relevant notion of equilibrium in this setting are extensive-form correlated equilibria (EFCE). Such an equilibrium is a joint distribution over the space of all possible plans of all agents. As is typical in an extensive-form setting, a plan is simply a mapping from information sets to their relevant action profiles that dictates what an agent does at any given situation of play. In the simultaneous setting, a mixed strategy profile is a correlated equilibrium when no agent wishes to deviate from the joint strategy profile, conditional on their realised strategy profile and prior knowledge of the joint distribution of play.


No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Neural Information Processing Systems

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium.


Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games

Anagnostides, Ioannis, Farina, Gabriele, Sandholm, Tuomas

arXiv.org Artificial Intelligence

In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Bai et al. (2022). As an immediate consequence, we guarantee convergence to the set of extensive-form correlated equilibria and coarse correlated equilibria at a near-optimal rate of $\frac{\log T}{T}$. Building on prior work, at the heart of our construction lies a more general result regarding fixed points deriving from rational functions with polynomial degree, a property that we establish for the fixed points of (coarse) trigger deviation functions. Moreover, our construction leverages a refined regret circuit for the convex hull, which -- unlike prior guarantees -- preserves the RVU property introduced by Syrgkanis et al. (NIPS, 2015); this observation has an independent interest in establishing near-optimal regret under learning dynamics based on a CFR-type decomposition of the regret.


Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Bai, Yu, Jin, Chi, Mei, Song, Song, Ziang, Yu, Tiancheng

arXiv.org Artificial Intelligence

A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address this problem in natural and important setups for the \emph{$\Phi$-Hedge} algorithm -- A generic algorithm capable of learning a large class of equilibria for NFGs. We show that $\Phi$-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the \emph{$\Phi$-Hedge} algorithms are equivalent to standard Online Mirror Descent (OMD) algorithms for EFGs with suitable dilated regularizers, and run in polynomial time. This new connection further allows us to design and analyze a new class of OMD algorithms based on modifying its log-partition function. In particular, we design an improved algorithm with balancing techniques that achieves a sharp $\widetilde{\mathcal{O}}(\sqrt{XAT})$ EFCE-regret under bandit-feedback in an EFG with $X$ information sets, $A$ actions, and $T$ episodes. To our best knowledge, this is the first such rate and matches the information-theoretic lower bound.